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ABSTRACT 

In April 1971 the Aslib Research and Development 
Department began a study on selective dissemination from MARC tapes. 
The aim of the pcoject was to explore the technical and economic 
feasibility of providing selective notifications of current i:/Ooks, by 
extraction from MARC tapes, to specialized libraries, fypical of the 
potential customers envisaged would be Aslib member organizations • A 
comparison would be made of the utility of the various elements in 
the MARC record as search keys. The project was planned in six 
phases: (1) planning and program specification for MARC file creation 
and searching; (2) programing; (3) exploratory work wit.h test file; 
(4) pilot operation with users; (5) analysis of resulto, conclusions, 
report; and, (6) market survey. This report covers the first five 
phases of the project. (Author/SJ) 
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Introduction 



In April 1971 the Aslib Research and Development Department 
began a study on selective dissemination from MARC tapes. 
The aim of the project was to explore the technical and 
economic feasibility of providing selective notifications of 
current books^by extraction from MARC tapes, to specialised 
libraries. Typical of the potentia] customers envisaged 
would be Aslib member organisations. A comparison would be 
made of the utility of the vrrious elements in the MARC 
record as search keys. 

The project was planned in Fix phases: 

1 - planning and program specification for MARC file 

creation and searching 

2 - programming 

3 - exploratory work with test file 

4 - pilot operation with users 

6 - analysis of results, conclusions, report 
6 - market survey 

This report covers the first five phases of the project. The 
first three phases were financed by a research contract irom the 
UK Office for Scientific and Technical Information, whose 
support is gratefully acknowledged. 

Our thanks are due particularly to the libraries that acted 
as our 'users' - without their help and cooperation we would 
have been unable to carry out this study. Brian Skinner of 
Cybernet Time-Sharing Ltd and the staff of the British National 
Bibliography have given great help. Work on this project has 
been carried out by Jane Wainwright, Jackie Hills, Brian Vickery 
and, in the first phase, also by Sumari Datta. 
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2 • Background and Sujuir.ary 

2 . 1 Other work on selective dissemination from MARC tapes : 
United Kingdom 

During the summer of 1971 Aslib Research and Development 
Department conducted a survey of British National 
Bibliography (BNB) - MARC tape users in the UK,^ From 
this survey it appeared that four organisations were 
carrying out sele'^tive dissemination from MARC tapes. 
The majority of ox. ^ Isations (group I in the list beluv;) 
were using Dewey Decimal classification (DC) numbers or 
ranges of numbers for selection. One organisation (II) 
was using Library of Congress (LC) subject headings for 
retrieval. A few organisations (III) had plans for future 
work • 

I Trinity College ,Dubl in - profiles for social 

scientists, Irish government 
departments and industrial 
information centres 

Queens University , Belfast - a few group profiles 

UKAEA,Aldermaston - a book pre-selection 

program, rather than SDI 

II Birminghara Libraries Cooperative Mechanization Project 

profiles for social 
scientists 

III The City University , London 

The Polytechnic of North London 

In all of this work search of the tapes is in the 'batch* 
processing mode. 



North America 

The other work in this field is being carried out in the 

US and Canada. An early study was carried out at Indiana 

2 

University by Studer , using MARC I records. The 
experimental SDI service, to forty social scientists, used 
Q weighted LC classification numbers and subject headings. 

ERIC 
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This was follov/ed by an operational weekly MARC II SLI 

3 

service at the Oklahoma Department of Libraries , The 
search keys used are DC and LC classification numbers or 
ranges of numbers and the service is being used by over 
twenty groups in the US and Canada. 

However since February 1971 Canada has had its ov;n service. 

The Office of Technical Services Library, University of 

Saska.tchev/an , in cooperation with the National Science 

4 

Library, National Research Council of Canada , has been 
carrying out selective dissemination of MARC - the SIlLDOM 
system. This is a highly flexible program, with seven 
main search key s (personal name, corporate name, title, 
DC and LC classification number, geographic code, date) 
offering boolean and weighted logic, and various output opt iais 

4 

Mauerhoff mentions work being done at Washington 
University School of Medicine on searching by LC 
classification numbers, and similar work at the University 
of Florida, Yale University Library, and Harvard University 
Library. From a survey of automated activities in US 
libraries'^i t appears that the University of Minnesota 
Libraries and the National Center for Atmo^3pheric Research, 
Boulder, Colorado, are also studying this area. 

The only report of on-line work comes from Syracuse University 
where a researrh project was carried out using MARC I 
records v;ith MOLDS, a generalized computer-based interactive 
retrieval program, which allowed for many different 
retrieval keys . 



2 . 2 Brief description of project 

If a service such as the one envisaged were run comraercially 
it would be carried out in batch mode. Hov/ever in this 
exploratory work we were considering various types 

ERIC 
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of search key and various forr.s of profile and would 
therefore be creating and modifying many profiles, Tl :? 
quickest and most eftective way ot doing this is to wor.. 
in the on-line mode, 

A specification for the requisite programs to iiandle 
MARC format tapes, on-line, with teletype-compatible 
terminal access, v/as prepared and submitted to a nunilor 
of commercial computer time-sharing bureaux. Quotations 
were received from four bureaux : Cybernet Time Sharing 
Limited was selected on the basis of their relevant 
experience in on-1 ine information retrieval work , 
and because they were able to amend existing software, 
thus offering the cheapest tender and the shortest 
completion time, 

Tlie M/.RC ASlib (MARCAS) programs allow both weiglited and 
boolean searching on the Title and Author, LC classification 
number and subject headings, and the BNB Precis indexing 
terms and Reference Index Numbers, which uniquely identify 
each Precis term. The DC number as such is not used for 
searching, but a range of DC numbers may be employed to 
refine a wsearch. 

Exploratory work with a test file was begun on com.plecion 
of the programming. 

Twelve libraries, covering a range of subject fields, were 
contacted, and of these nine were used for the pilot 
operation. Profiles were constructed for these users and 
run on six weeks of BNB and six weeks of LC MARC tapes. 
The output from each weekly search was assessed for relevance 
by the users*, and in the final week a measure of the recall 
of the system was also obtained. 

The results of this pilot operation were then analysed^ 

ERIC 



in terms of precision and recall, for various combinations 
of the seaichable fields. The best perf o* mane e , with 
precisior and recall bctri about 50%, v^as given by searchiiig 
all verbal fields together tit'e and author, LC subject 
headings and Precis inde/iny terms (BNB tapes wnly) • 

Costs for the on-line system were identified. 

A batch version of the MARCAS system was then implemented 
and computer costs of £1.30 per library, per week, for 
seaiching both BNB and LC tapes were calculated. 

Further work which would be desirable background for 
implementing a cost recovery service include a survey 
of book selection and a cquisi ticn needs and methods 
currently used. Further study of available software, 
including testing, would probably indicate a more efficient 
system for operational use. The characteristics of standard 
profiles, their usefulness and relation to a specialized 
service also requires more study. 



Methodolocjy 



3. 1 Programs 



An analysis of BNB"^4ARC tapes was carried out to gain 

information about various characteristics, such as tne 

numbers of fields present, their average length, the 

numbers of records in certain subject fields, and 

the co-occurrence of indexing words in the title, DC and 

LC subject headings. Thought was also given to the fields 

that should be searchable; to the type of profile 

construction and the maximum number of terms that should 

be included; and to the subject areas that should be covered 

This analysis led to the follov/ing decisions and implementat- 

iofis. 

Initially an item is selected from a MARC tape for further 
processing only if it is in the English language, does 
not have a Juvenile indicator, and does have a DC number. 
This last constraint was based on the fact that only low- 
level items art: not assigned a DC number and, more 
particularly, tha^. DC offers the most concise method of 
selecting subject areas. It was originally decided to 
include only items starting with DC numbers 0,3, 5, 6, or 7 
since our interests were t-o be concentrated in the science, 
technology and social science fields. Items with DC 
number 9 were later included, in order to cover geography. 
(The only other possible field for subject area selection - 
the LC classification number - is not prereiit on about 15% 
of the items, and so its use was not explored). When the 
items have been selected from the original tape they are 
sor*-ed into DC number order. Items assigned more than 
one DC number are duplicated in the file - once for eacit 
number (about 4% of the items are duplicates). 

The follov/ing fields from each MARC record are stored on 
disc, and those marked with an asterisk* are searchable: 
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COl - International Standard Book Kunber (ISBN) 

'■^050 - LC classification number 

082 - DC n-umber 

*245 - Title and Author 

260 - Place of publication, publisher and date 

*650)- LC subject headings 
651) 

*690 - BNB Precis string 

*692 - BNB Reference Index Number (RIN) 



A range of DC numbers within which the search is carried out 
can also be specified. 

A 'compressed' file is formed from the searchable elements 
in each record, in order to speed up the search prcces*^. 
The compressed file acts as a filter in that it el-^a.. nates 
non-hits, but each possible hit has to be che^ kt;d aaainst 
the main text file. Both files are linea^^. The fairly 
crude file structure is only feasible with small files. 
The search files are no longer held in the MARC format, 
which is basically a communications format. The file set- 
up procedure is carried out with two programs which are 
submitted for processing in remote job entry mode . 

Searches can be made using either weighted or boolean logic 
on one or any combination of the searchable fields. Up to 
30 terms can be used in a search. Searches can be entered 
directly from the keyboard or via paper tape or from a 
di sc file. The output may contain the whole of each 
record that is stored, or any part of it that the searcher 
wishes, and may be typed by the terminal or put onto disc 
for printing in batch mode on the computer centre's line 
printer. Figure 1 j.ndicates the available options. 

Further modifications have been made to the original 
programs due both to our experience and Cybernet 's 
continuing interest, and the system now provides a wide 
range of available options. The reader is referred ^o 
9^ Appendix I - MARCAS Ui^er Guide - for full details of \r 

current on-line retrieval system for MARC tapes. 
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Options for processing 



Input of profile 




search 
program 

(MAPCAS) 




Figure 1 
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3.2 Exploratory work 

By November 1971 the prograirio were ready fur use, together 
with a test file, created from three weeks of BHD-MARC 
tapes, comprising some 1250 items. Some time was spent 
getting to know the system. It was obvious that the 
wide range of options offered by the programs would 
enable us rasily to compare the use of different types of 
profile on the same base and also, because of the immediate 
response given by the system, that we would be able to test 
and amend profiles quickly ^.nd efficiently. The programs 
have proved fairly trouble-free for the whole duration of 
the project. 



3.3 Pilot Users 

Twelve libraries were originally approached and of these 
one did not reply, one proved too large to cope with and one, 
although keen at first, was unable to cooperate in the 
later stages of the project. Thus we had i»ine organisations 
(2 industrial/commercial, 2 nationalised industry, 1 local 
government, 2 research association/institute, 2 unh/ersity^ 
post-graduate) . 



The librarians v.ere interviewed and a description of the main 
subject field of each library obtained. (These covered 
steel, plastics, electronics, instrumentation, computers, 
medicine, law, business and television.) We also gained 
an idea of the libraries' fringe interests these varied 
from marketing, management and computing to library studies. 
We obtained an estimate of the number of book purchases 
per annum - this varied from 100 to 2,500, with an average 
of 650. 



The librarians were asked about the sources they used for 
book selection. Only three of the nine libraries used the 
BNB Weekly List (one of these being the largest library). 
All libraries received and used publishers' announcements 
and catalogues, and most received specialised lists and 
scanned book reviews. Other sources mentioned included 
Bookseller, Times Literary Supplement , nMSO List and 



10 



Asllb Book List , The other major source is, naturally, 
requests from readers. The librarians who used BNB 
foundit 'a bit late but useful nevertheless.' Most 
libraries had some standing orders for series, etc. 

We also obtained a list of recent book acquisitions. 

The librari'-^ns or their assistants itiarked the three END Weekly 
Lists corresponding to our tebl-file at two levels of 
relevance (1 - books they v/ould buy, 2 - books they v/ere 
interested in knov/ing about) . 



3.4 Profile construction and testing 
3.4.1 General 

The profile construction aids which we had available 
were : 

the 18th edition of the Dewey Decimal Class- 
ification 

the Library of Congress Classification 

card file of terms used in the 1971 Precis index, 
with their related RIN's 

recent acquisitions lists from our test libraries 
three week BNB-MARC test file (held on-line) 

The original plan had been to construct profiles 
from each library's described interests, using the 
first four aids above and then to test these 
profiles against the MARC test file. However it 
proved difficult to construct profiles in this 
way because, on the whole, not enough information 
was available. In fact the known relevant items 
^ in the test file provided us with the best guide 

ERIC 
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for constructing profiles*^ the other aids being 
used mainly for producing synonyms and related 
terms or concepts. 

Profiles were constructed in 'natural language' 

rather than in specific terms from DC or LC 

classifications since we required to test all the 

verbal fields for retrieval. We also felt that 

we could handle with more confidence the broader 

class formulations. In fact we found the LC 

classification fairly difficult to deal with - 

7 

as did another researcher who used only the LC 
subject headings for profiling. 

3.4.2 Natural language boolean profiles 

Eventually, though this stage proved more time- 
consuming than anticipated , natural language 
boolean profiles were constructed for all libraries. 
An example is shown in Figure 2: 

'EinGHEMIST' OR ' HI OGMEMIC AL ' OR 'OIOLOG ' AMD 

('CHEMIST' OR 'OEVFLOP' OR 'EXPERIMENT' OR 'PIIYSIC ' 

OR 'CONTROL') AND NOT 'yCHOOL ' OR 'BIOPHYSIC ' OR 

'BIOMEOIGAL FNGINEE.fl' OR 'ERGONOMIC ' OR 

'DESIGN ANTHROPOMETRY' GR 'MEDICAL' AND ('TECHNOLnG' 

OR 'LABOFmiOH' OR 'EQUIPMENT') AND NOT 'PROFESSION' 
• 

9 

Figure 2 

Since each profile that can be entered in the 
MARCA3 system has a limit of 30 terms, several 
profiles had to be constructed for each library 

(the number varied from two to six, with an average 
of four) . In fact tne average number of terms 
needed to describe a library's interests was 100 

(with a range from 50 to 150) . 

*UKCIS have al^o found that the examination of known 
relevant items has proved the most useful aid in 
profile construction . 




The initial profile creation took about five days 
for each library. These profiles were then tested 
on-line against the three week test file (searching 
on the Title and Author field (245), the LC subject 
headings (650 and 651) and the BNB Precis string 
(690)) and amended and refined as necessary. This 
latter task was achieved very quickly and painlessjy 
with the on-line facilities. The profiles were 
entered using paper tape. 



Further types of profile 

For five of the libraries further profiles were 
constrncted. An example of each type is illustrated 
below. 

i) Natural language, weighted - for searching the 
245, 650 /l, and 690 fields 

•tELECOMMUNIGAT ' (6) /TELEVISION '(6)/ TV '(6). 
• T.V, '(6). 'RADIO '(6)/RA0inC0MMLIMIGAT'(6). 
'WIRELESS' (6) , 'nROAOCAST ' (6) , 'TRA.\'S V.I TIER ' (6) . 
•receiver '(6), 'RECEIVIMG SET'(6). ' AFR I ALB ' ( 6 ) . 
•ANTENrJAS'(6). 'THYRISTDRB ' { ) . 'ELECTRONIC EQUIPMENT '(6! 
'PRINTED CIRCUIT ' (6) . 'STEEL' (3) . 'STRUCTUR ' (3 ) . 
•''METRIC SYSTEV/(^ ). 'BUILDING '(?). 'POST OFFICE '(^). 
'great BRITAIN '( 1 ). 'ANNUAL REPORT '. 'SCHEDULING ' (6 ) ; 

Figure 3 



ii) LC classification numbers, weighted - for search- 
ing the 050 field 



'LC5P01 • , 'LCSP09' . 'LCS?!1S'. •lC5219\ 'LC5256' . 'LCfiS?* ' 
'LC6^?81 ' / PN 1991 '. 'PN 199? '. 'Prn993 \ Vn 199<^. ' . 'PNA7** ' 
'PN511* '/PN512* '/PN513* ','PN51fl* '/PN5150' 



Figure 4 
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iii) Reference Index Numbers, weighted - for search- 
ing the 692 field 

•0OO?O^.072 '/n007n6?56'/0O3BonO?X \ '00?C10?00?'. 
•D0 3005C62 '/ 00 1 1 06;>5? '/0CM4t)1137 \ '00 190100^ • . 
'000102083', '0 00 1 0 325X \ ' CO 03 0 3 1 ?7 \ ' 00 17 0206? ' . 
'00 1fi05177' , 'C010U60^.5 \ 'n01001?0 5', 'OO in07(!^ 1 ' . 
'00 3106209 * , '0022080 16 ' / 0020 0 3 IBX ' , ' 00 0 20^, 0!^6 ' . 

•OOOl06030'(-5); 



Figure 5 



These profiles were also tested against . . three 
week test file and altered wliere requiijc* 

The three types of profile above, together v;ith the 
natural language boolean one (3.4.2), enabled us 
to expxcre all our chosen searchable fields and 
also to compare the usefulness of boolean logic 
with ^'eighted logic. 



3,5 Running the profiles; assessments 

The natural language boolean profiles for each library 
were run on six weeks of BNB tapes and £ix weeks of 
LC tapes, with no further aniendments to the profiles. 
This work was done on-line (with output onto a disc 
for computer line-printer output) for our convenience, 
although, if an operational system had existed the task 
could have been carried out equally well in batch mode. 
Central processor and connection times for the searches 
were recorded for two weeks • 

The results of the first five weeks' searches (an example 
of the output is shown in Figure 6) were sent to the 
libraries for assessment at three levels of relevance: 
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1 - Important reference; books you would buy 

2 - Relevant reference; books you are intv^rested 

in knowing about 

3 - Noil relevant 

The users were also asked to mark those items they 
already knew of, so that we might get an idea of the 
currency of the tapes. 

For the final week the users were sent the complete 
BNB Weekly List and a copy of the LC file, and asked 
to mark relevant items either 1 or 2 as above. This 
enabled us to obtain a value for recall. Unfortunately 
one library did not have the time to mark up the last 
three weeks' LC files. 

The other types of profile (3.4.3) were run only on the 
last week of the BNB and LC tapes. The users were not 
sent the output from these searches since we already had 
their assessment of the total relevant items in these 
files. 
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Results 

4 . 1 Performance 

4.1.1 General 

The measures used to evaluate the performance of 
the system are recall and precision (expressed 
as percentages) . The figures for each separate 
library ha^e been taken as the ratio of the total 
values over the six weeks (on the assumption that 
the results are reasonably homogeneous over the six 
weeks) (section 4.1.2). For the figures in sections 
4.1.3 and 4.1.4 only the final week's values are 
concerned^ The overall figures for all libraries 
have been taken as the average of the ratios of each 
library (because each library's figures should be 
allowed to have an equal effect on the final figures 
and in fact some libraries had large outputs and 
some small). Rather than being left out of the 
calcula*:ions recall and precision ratios of % 
have been taken as 100% , since the relationship 
between the first and second levels of relevance then 
makes more sense. In Tables 1-4 the actual number 
of relevant retrieved items has been noted. 

The results fall into three main groups. The 
first covers precision figures for all libraries 
over six weeks on natural language boolean searches, 
the second precision and recall figures for all 
libraries for the final week on natural language 
boolean searches, and the third precision and 
recall figures for five libraries for the final 
week for various types of search. 

4.1.2 Precision figures (Table 1) 

Table 1 shov/s precision figures for natural language 
boolean searches for each library o^^cr six weeks on 
BNB and LC tapes. The precision Jigures are given 
at two levels of relevance - 
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PI - is the ratio (expressed as a percentage) of the 
relevant retrieved items at relevance level 1 
(see section 3.5) to the total retrieved items 

P2 - is the ratio of the relevart retrieved items 
at relevance level 1 plus 2 to the total 
retrieved items. 

Average values over all libraries have been calculated 



BNB tape searches 

The fields searched were the Title and Author (245), 
the LC subject headings (650 and 651) and the Precis 
indexing terms (690). Precision figures for 
searching various combinations of these fields have 
been identified. If we consider the precision 
averages P2 there is an increase in precision on 
searching the 650/1 field as opposed to the 245 field 
-from 59.1% (245) to 65.8% (650/1). When all three 
•verbal' fields (i.e. including the 690) are searched 
the precision drops a little (to 55.7%) but this is 
probably compensated by an increase in recall (see 
4.1.3). Overall recall figures are not available 
because the task of obtaining these figures would 
have put too great a burden on the ujers. However 
the relative recall {RR2) of the 245 and 650/1 
fields to the total number of relevant items 
retrieved by searching all three fields (at relevmce 
levels 1 plus 2) is seen as 55.3% and 62.1% 
respectively . 



LC tape searches 

The Precis terms are not present on the American tapes 
so we can only compare searches on the 245 and 650/1 
fields. It is inteiasting to note that although 
the average precision value PI is fairly low (about 
10%) the value for P2 is nearly 50%. In other 
words though libraries may hot wish to buy many of 
ER^C" these American books, they are glad to have 

inforiiiation about them. (see also 4.1.5 - Currency). 



Table 1. Natural language boolean searches over six weeks. 
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Library 


Precision 


Relative 


Recall 




Fields searched 








Title \ 
LC Subject; 
headings f 
Precis* j 


Title ^ 


[iC Subject 
readings 


Title \ 
LC Subject ^ 
headings 

J 


Title 


LC Subject 
headl ngs 


BNB Tape 


PI 


P2 


PI 


P2 


PI 


P2 


PI 


P2 


RR2 


RR2 


1 


8.5 


84.5 


10.8 


83.8 




90. 5 


8.5 


84.7 


51.7 


6\ 3 


2 


J.O ■ 4 




O 1 • ^ 






77 7 
/ / - J 




7 s n 


46.2 


4 3.6 


3 


11.5 


15.4 


25.0 


33.0 


30.0 


30.0 


15.8 


21.1 


lOO.O 


75.0 


4 


19.6 


47.8 


20.8 


50.0 


29.4 


64.7 


20.0 


5fe.O 


54.5 


50.0 


5 


28. 3 


63.0 


43.3 


83. 3 


43.3 


80,0 


31. 7 


65.9 


86.2 


82.8 


6 


58 7 


71 . 6 


74 . 8 


86 • 9 


69 . 1 




70 S 
/W.J 


85 . 5 


49.2 


60.8 


7 


4.8 


20.6 


5.3 


15,8 


8.6 


28.6 


5.4 


21.4 


46.2 


76.9 


8 


72.5 


91.3 


50.0 


66,7 


82.8 


93.1 


72.7 


87.9 


6. 3 


42.9 


9 


8.1 

_ 


36.0 


10. e 


37.8 


9.9 


43.7 


7.9 


36.8 


57.1 


63. 3 




218.4 


501. 1 


278.3 


532. 3 


314.4 


592.5 


257. 5 


5 34.3 


497.4 


558,6 


Average 


25.4 


55.7 


30.9 


59.1 


34.9 


65.8 


28,6 


59.4 


55.3 


62.1 


Total Relevant 
Retrieved 


259 


468 


127 


221 


160 


276 


187 


344 


221 


276 


i ape 


PI 


P2 


PI 


P2 


PI 


P2 










1 


0 


81 . 9 


0 


82 . 1 


0 


92 . 6 


I 

\ 

\ 




60.4 


82.4 


2 


17.0 


56.6 


26.7 


66.7 


25.0 


62.5 


\ 




66.7 


66.7 


3 


7 . 7 


23.1 


10.0 


30.0 


14 . 3 


42.9 






loo.o 


100.0 


4 


10.8 


39.2 


12.9 


40.3 


13.7 


41. 2 


\ 




86.2 


72.4 


5 


20.4 


50.0 


30.0 


6O.0 


27.8 


52.8 


\ 

\ 




66.7 


70.4 


6 


1.9 


31.7 


0 


30.1 


2.5 


36.4 


■ 


43.1 


86. 3 


7 


3.7 


13.8 


5.7 


20.0 


4.5 


15.7 




\ 


46.7 


93.3 


8 


18.9 


58.9 


20.0 


60.0 


18.4 


69.4 




\ 


48.2 


60.7 


9+ 


0 


31.1 


0 


42.9 


0 


32.6 




\ 


63.2 


78.9 




80.4 


386.3 


105.3 


432,1 


106.2 


446.1 






581.2 


711.1 


/*verage 


8.9 


42.9 


11.7 


48.0 


11.8 


49.6 






64.6 


79.0 


Total Relevant 
Retrieved 


54 


321 


37 


189 


42 


245 




\ 


189 


245 
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4.1.3 Precision and recall figures (Table 2) 

Table 2 shows precision and recall figures for 
nctural language boolean searches for each library 
for the final week only, on END and LC tapes. The 
figures are at two levels: PI, Rl (precision and 
recall at relevance level 1) and F2, R2 (precision 
and recall at relevance levels 1 plus 2) . Average 
values over all libraries have been calculated. 

BNB tape searches 

Searches over all verbal fields at relevance levels 
1 plus 2 give 42.0% precision and 56.3% recall. 
Searches on the 245 field alone improved precision but 
lowered recall (44.8% and 34.3% respectively) , while 
these on the 650/1 field improved precision even 
more and did not lower recall so much (50.8% and 
42.1% respectively). Recall is obviously being 
increased (Rl in Table 2) by the use of the 690 
field - in fact by 42.1% at relevance level 1 and 
24.2% at relevance levels 1 plus 2. 

LC tape searches 

Precision and recall figures for searches on the LC 
tapes show the same trends as those on BNB tapes. 
Searching both the 245 and the 650/1 fields gives 
precision and recall of 46.3% and 59.7% respectively; 
searching the 245 field alone gives 51.3% and 44.8% 
respectively; searching the 650/1 field gives 
49.8% and 44.7% respectively. These are the figures 
for relevance levels 1 plus 2 - those for relevance 
level 1 are much lower. 
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Table 2. Natural language boolean .searches for final week 



Library 



Precision and Recall 



Fields searched 



Title \ 
LC Subject headings I 
Precis* ) 



Htxe 



LC Subject headings 



Rpcall 

increase 

using 

Precis 

field 



3NB Tape 



PI 



1 
2 
3 
4 
5 
6 
7 

8 
9 



O 
O 

o 

20. C 
9.1 

C 

66.7 

O 



141.. 



Average 

Total 
Rel. Ret 



15.7 
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LC Tape 



PI 



1 
2 
3 
4 
5 
6 
7 
8 

9+ 



O 

15.41 

O 
O 

33. 
7. 

O 

30. 



86. G 



Average 10.9 57.6 



Total 
Rd.Ret. 



15 



Rl 



O 
O 
O 

O 

lOO.O 
100 .0 

24.6 

O 
O 

66-7 

O 

O 



591.3 



65.7 



21 



Rl 



O 
O 

100.0 

O 
O 
O 

66.7 
40.0 
O 
53.8 



460*5 



15 



P2 



61.5 
60.0 
O 

30.0 
81.8 
71.4 

O 

66.7 
6.9 



378.3 



42.0 



52 



P2 



55.6 
38.5 
100. 0 
35.0 
50.0 
34.6 
13.0 
43.5 



370.2 



46.3 



51 



R2 



PI 



80. 0 
42.4 

O 

lOO.O 
75.0 

25.5 

O 
O 

50.0 
33.3 



O 
O 
O 

66.7 

9.1 

63.6 

O 

O 
O 
O 



506.7 



239.4 



56.3 



26.6 



52 



10 



R2 



PI 



71.4 
83.3 
OO.O 
70.0 
30.0 
36.0 
50.0 
37.0 



O 

25. C 

O 

O 

37,5 

O 

O 

41.7 



4 77.7 



104 2 



59.7 



51 



13.0 



10 



Rl 



O 
O 
O 

O 

lOO.O 
100.0 

10.8 

O 
O 

O 
O 
O 



510.8 



56.8 



10 



Rl 



£ 
O 

lOO.O 
O 
O 

o 

50.0 

O 

G 

38.5 



388.5 



48.6 



10 



P2 



57.1 

O 

O 

66.7 
81.8 
90.9 
O 



O 
O 



7.1 



403.6 



44. B 



25 



P2 



70.0 
50.0 
100.0 
38.9 
62.5 
22.2 
16.7 
50.0 



410.3 



51.3 



33 



R2 



PI 



40.0 

O 

O > 

66. y 

75.0 

10.2 

O 
O 

O 

16.7 



O 
O 

O 

50.0 

10.0 

37.5 

O 

O 
O 

O 



308.6 



197. 5 



34.3 



21.9 



25 



R2 



PI 



50.0 
66.7 
lOO.O 
70.0 
25.0 
8.0 
16.7 
22.2 



O 

16.7 

O 

O 

37.5 
10.0 
O 

27.3 



358.6 



91.5 



44.8 



11.4 



33 



Rl 



O 
O 

O 

O 

iO0,0 

lOO.O 

9.2 

O 

O 

O 
O 
O 



P2 



509.2 



56.6 



Rl 



O 
O 

50.0 

O 
O 
O 

50.0 
40.0 

O 

23. i 



363.1 



45.4 



71.4 
33.3 
O 

75.0 

80.0 

87.5 

O 

O 
O 

10.0 



457.2 



50.8 



33 



P2 



63.6 
33.3 
100.0 
33.3 
50.0 
45.0 
27. 3 
45.5 



398.0 



49.8 



36 



R2 



RIl 



50.0 
14. 3 

O 

100. 0 
66.7 

14.3 

O 
O 

O 

3J, 3 



O 
O 
O 

O 

O 

O 

12.3 

O 
O 

66.7 

O 
O 



378.6 



379.0 217.8 



42.1 



42.1 



33 



R2 



RI2 



30.0 
28.6 

0 
O 

O 

9.2 

O 
O 

50.0 

O 



24.2 



16 



50.0 
33.3 
lOO.O 
50.0 
20.0 
36.0 
50.0 
18.5 



/ 



357,8 



44.7 



36 



7 
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♦Field not present on LC tapes 



+ Assessment not given 
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4.1.4 Preci;-5icn and recall for VLiriour> rcc:rchos 



BN B tape searches (Table 3) 

Precision and recall figures (P2, R2) for relevance 
level-; 1 plus 2, are given for natural languag-^ 
boolean and weighted searc:es, for LC classification 
number (field 050) searches (using weighted logic) 
and for Reference Index Number (RIW) (field 692) 
searches (weighted logic) . These searches were 
carried out on the final week's tape and for only 
five libraries; average values over these five 
are calculated. 

For the natural language searches precision and 
recall values are given for all the various 
combinations of searching the 245, 650/1 and 690 
fields. For both boolean and weighted logics the 
combination of 650/1 and 690 searching produced the 
best resultSf while searching on the 245 field alone 
gave the worst performance. Boolean logic performed 
slightly better than weighted logic. Searches on 
the LC classification numbers gave an increase in 
precision, but a decrease in recall. Searches on 
the RIN produced lower precision and lower recall. 

LC tape searches (Tabla 4) 

Precision and recall figures are given for natural 
language boolean and weighted searches and for 
LC classification number searches. These searches 
were carried out on the final week's tape and for 
five libraries; one library failed to return the 
relevance assessment so the average values are 
calculated over four libraries. 



The values here are much higher than those for 
similar BNB tape searches. This is partly due to 
library 3, which had no relevant items on the BND 
tape and only one (which v;as retrieved) on the I.C 
tape, and also to library 5, which had low values 
for the BNB tape and gave no figures for the LC 




Table 3. Various searches for BNB final week. 
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Library 


Precision and Recall 




Fields searched 


Title \ 
LC Subject 
headings ■ 
Precis J 


Title 


LC Subject 
headi ngs 


Precis 


Title ^ 
LC Subjectl 
headings J 


Title A 
Precis j 


LC Sulocct'N 
heddir\gs 
Precis J 


NATURAL 
LANGUAGE 
BOOLEAN 
SEARCH 


P2 


R2 


P2 


K2 


P2 


R2 


P2 

■ 


R2 


] 

V2 


R2 ■ 


P2 


R2 


P2 




1 
2 
3 
4 
5 

Average 
Total 
reL ret. 


61.5 
60.0 
0 

30.0 
6.9 


BO.O 
42.9 

0 

lOO.O 
33. 3 


57.1 

0 
0 

66.7 
7.1 


40.0 

0 
0 

66, 7 
16. 7 


71.4 

33 . 3 
0 

75.0 
10. 0 


50.0 
14, 3 

0 

100.0 

33.3 


70.0 
60.0 

0 

22.: 
14. 3 


70,0 
4 2.9 

0 

66 . 7 
33.3 


50.0 
33.3 

0 

75.0 
7.7 


50,0 
14, 3 

0 

100.0 
33. 3 


58.3 
60.0 

0 

22.2 
9.5 


70.0 
42. 1 

0 

66.7 
33.3 


12.7 
6O.0 
0 

30,0 
P, 3 


80. 0 
42.9 
0 

100, 0 
33. 3 


15B.4 


256.2 


130.9 


123.4 


189.7 


197.6 


166. 5 


212.9 


166.0 


197.6 


150.0 


212.9 


171.0 


256.2 


33 .7 


51.2 


26.2 


24.7 


38.0 


39.5 


33.3 


42.6 


33.2 


39. 5 


30,0 


42.6 


34.2 


51.2 


16 


16 


7 


7 


11 


11 


14 


14 


11 


11 


14 


14 


16 


16 


NATURAL 
LANGUAGE 
WEIGHTED 
SEAPCH 


P2 


R2 


P2 


R2 


P2 


R2 


P2 


R2 


P2 


R2 


P2 


R2 


P2 


R2 


1 
2 
3 
4 

5 

Average 

Total 
Rd. Ret. 


60. O 
40. 0 

0 

30. 0 
7.4 


60. 0 
28.6 

0 

100.0 
33. 3 


50.0 

0 

0 

22.2 

0 


40.0 

0 
0 

66. 7 

0 


lOO.O 
33. 3 

0 

30.0 
15.4 


40.0 
14.3 
0 

100.0 
33.3 


ICO.O 
50.0 

0 

22.2 
13.3 


70,0 
28.6 

0 

66.7 
33.3 


50.0 
25,0 

0 

30.0 
10.5 


50.0 
14. 3 

G 

100.0 
33. 3 


58.3 

O 

22.2 
8.7 


70.0 

28.6 
0 

66.7 
33.3 


75.0 
50.0 
0 

30.0 
10. 5 


90.0 
28.6 

0 

100.0 
33. 3 


137.4 


221,9 


72^2 


106.7 


178.7 


187.6 


185.5 


198.6 


115.5 


197.6 


129.2 


198.6 


165.5 


251.9 


27. 5 


44.4 


14.4 


21.3 


35.7 


37.5 


37.1 


39.7 


23.1 


39. 5 


25. S 


39.7 


33. 1 


50.4 


13 


13 


6 


6 


10 


10 


13 


13 


11 


11 


13 


13 


16 


16 




LC Classif. 
No. 






RIN 






LC CLAS.^, 
NO. WEIGH! 


P2 


R2 






RIN 

WEIGHTED 
SEARCH 


P2 


R2 






1 
2 

3 


66.7 
66.7 

0 


66 7 
28.6 
0 






1 
2 
3 


54.5 
50.0 

0 


60,0 
42.9 
0 





Library 


Precision and Recall 




Fields searched 




Title Y 
LC Subject 
headings / 
Precis J 


Title 


LC Subject 
headings 


Precis 


Title ^ 
hr Subjectk 
headings J 


Title 
Precii^ J 


LC Sul jecA 
headings 
Precis j 


NATURAL 

LAl^JGUAGE 

BOOLEAN 

SEARCH 


P2 


R2 


P2 


K2 


P2 


R2 

1 
I 


P2 1 


R2 


... 


R2 


P2 


r<2 




R2 


1 

2 
3 
4 

5 

Average 
TOt^l 
reL ret. 


61.5 
60.0 
0 

30.0 
6.9 


80.0 
42.9 

0 

loo.o 

33.3 


57.1 

0 
0 

66.7 
7.1 


40.0 

0 
0 

66.7 

16 ^ 


71.4 
33.3 

0 

75.0 
10. 0 


T 

1 

50.0 ^ 
14.3 

0 

100.0 

. i 


70.0 
t 0 . 0 

o 

22.2 
14. 3 


70. 0 
4 2.9 

o 1 
^3.3 


5C.0 
3 J . J 

0 

75.0 


50.0 
14 . J 

0 

100.0 
33.3 


58.3 
60. 0 

0 

22.2 
9.5 


70.0 

0 

66 . 7 
33.3 


12.7 
60.0 

0 

3O,0 

e, 3 


80.0 

0 

lOO. 0 
33.3 


158.4 


25r.. 2 


130.9 


123.4 


189.7 


197. b 


5 


212.9 


166.0 


197.6 


150.0 


212 .9 


171.0 


J5b . 2 


31.7 




51 .2 


26.2 


24.7 


J8,0 


39 . 5 


3 3.3 


42 


33.2 


39. 5 


30.0 


42.6 


2 

16 


51.2 


16 


IC 


7 


7 1 11 


11 


14 


14 ' 


11 


11 


14 


14 


It 


NATURAL 
LANGUAGE 
WEIGHTED 
SEARCH 


P2 




P2 


R2 






P2 


R2 


P2 


R2 


P2 




P.' 


R2 


J 
2 
3 
4 

5 

Average 
Total 


60. 

40. 0 
0 

30. 0 
7.4 


60.0 
28.6 

0 

loo.o 

33.3 


50.0 

0 
0 

22.2 
1 ^ 


40.0 

0 
O 

66.7 

0 


loo.o 

33.3 

0 

30.0 
15.4 


40.0 
14.3 

0 

lOO.O 
33. 3 


ICO.O 
50.0 
0 

22.2 
13.3 


70.0 
28.6 

0 

66.7 
33.3 


50.0 
25.0 

0 

JO.O 
10.5 


50.0 
14.3 

0 

loo.o 

33. 3 


58. J 
40.0 

0 

8.7 


70.0 
28.1 

O 

66.7 
33.3 


7L .0 
0 

30.0 
10, 5 


^' . 0 
2b. ^ 

^ I'X) . ^• 
33. 3 


137.4 


221.9 


72.2 


106.7 


178.7 


187.6 


185.5 


198.6 


115.5 


197.6 


129. 2 


198.6 


165, 5 


2 51.9 


: 27. 5 


44. 1 


14.4 


21.3 


35.7 


37.5 


37.1 


39.7 


23.1 


39. 5 


25.8 


39,7 


33.1 


jO.4 


13 


13 


6 


6 


10 


10 


13 


13 


11 


11 


13 


13 


16 


16 




LC Classify 
No. 






RIN 






RIN 

WEIGHTED 
SEARCH 


P2 


R2 




LC CLASS, 
NO. WEIGHT 


P2 


R2 




1 
2 
3 
4 
5 

Average 


66.7 
66.7 
0 

66.7 
14.3 


66.7 
28.6 
0 

66.7 
33,3 






1 

3 
4 

5 

Average 
Total 
Rel. Ret. 


54.5 
50.0 

0 

33.3 
15.4 


GO.O 
42.9 
0 

33. . 
33. 3 




1214. 4 


195.3 






153.2 


169. 5 


42.9 




39.1 




30.6 


33.9 




^leL Ret, 


12 


12 




12 


12 



Table 4. Various searcheb for LC final week. 
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Library Precision and Recall 




Fields searched 


Title \ 
LC Subject I 
headings j 


Title 


LC Subject 
headings 


NATURAL LANGUAGE 
BOOLEAN SEARCH 


P2 


R2 


P2 


R2 


P2 


R2 


1 

2 
3 
4 

5 + 

Average 

Total Relevant 
Retrieved 


55, 6 
38.5 

ioo.o 

35.0 


71.4 
83.3 
lOO.O 
70. C 


70.0 
50.0 
lOO.O 
38.9 


50. 0 
66.7 
100.0 
70.0 


63.6 
33. 3 
100.0 
33. 3 


50. 0 
i3. 3 

loo.o 

50.0 


229.1 


324.7 


258.9 


286.7 


230. 2 


2 33.0 


57. 3 


81.2 


64.7 


71.7 


57.6 


58. ^ 


23 


23 


19 


19 


15 


15 


NATURAL LATNiGi.AGE 
WEIGHTED SEARCH 


P2 


R2 


P2 


R2 


P2 


R2 


1 

2 
3 
4 

5 + 

Average 

Total Relevant 
Retr -.eved 


71.4 
45.5 
16.7 
34.6 


71.4 
83.3 
lOO.O 
90.0 


70.0 
66.7 

5C.0 
34.6 


50.0 
66.7 

loo.o 

90.0 


70.0 
33.3 
33.3 
33.3 


5Cj.O 

loo.o 

70. Cj 

1 


168. 2 


344.7 


. — — 

22 < 


J06.7 


169.9 


253.3 


42.1 


86.2 


55.3 


76. 7 


42. 5 


6 3.3 


25 


25 


21 


21 


17 


17 




LC Classif . 
No. 




LC CLASSIFICATION 
NO. WEIGHTED SEAXBi 


P2 


R2 


1 
2 
3 
4 

5+ 

Average 

Total Relevant 
Retrieved 


83.3 
40.0 
50.0 
58. 3 


35.7 
33.3 
ICO.O 
70.0 




+ Assessment not given 


231.6 


239.0 




57.9 
15 


59.8 

15 
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However it can be seen that searching the 245 
and 650/1 fields in combination gave the best 
performance for natural language searches, with 
bopleai searching slightly ahead of wieghted 
searching. Again LC classification number searches 
increased precision, though only marginally, but 
decreased recall. 

Venn diagrams (Figure 7) 

Another way of considering the results is by the 
use of Venn diagrams. The results of searching 
the final week's BNB and LC tapes by natural 
language be clean profiles are shovm for four 
libraries. All the retrieved items have been 
entered in the appropriate relevant (relevance 
level 1 plus 2) or non~relevant diagrams. The 
sample used is rather small but it is interesting 
to note that in the BMB searches 86% of tlie 
relevant items could have boen retrieved by 
searching only the 690 field and the additional 
14% by searching the 650 field. The 690 field 
search also produces 85% of the non-relevant 
items, but since the precision is fairly high this 
should not be considered much of a disadvantage. 
For the LC searches however it appears that 
searching the title (245 field) will give 82% of 
the relevant items (and 66% of the non-relevant 
items) . 



4.1.3 Genoral figures (Table 5) 
Currency 

Our attempt to obtain an idea cf the currency of 
the service should not be considered too strictly 
as we were not running the tapes immediately after 
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Figure 7. Natural language boolean searches for final week. 
Totals for libraries 1,2,3,4. 

Retrieved, Relevant items Retrieved, Non-Relevant items 



Title LC Subject LC Stibject 

headings Title headings 

(245) (650/1) (245) (650/1) 



BNB Tape 




(690) (690) 
Precis Precis 



Title 
(245) 



LC Subject 

headings 

(650/1) 



Title 
(245) 



LC Subject 

headings 

(650/1) 
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Table 5. General figures. 



Library 


Currency 


Items scanned 




% already seen 
each relevance 


at 

level 


Per week ratio: items 
scanned/items retrieved/itenis 
retrieved and relevant 


BNB Tapes 


I 


R 


N 


1 


16,7 


3.7 


0 


30O/12/10 


2 




6.7 


0 


251/9/7 


3 


66.7 


0 


O 


251/4/1 


4 


22.2 


0 


n 


142/8/4 


5 


92.3 


6.3 


0 


300/8/5 


6 


89.7 


41.2 


50.7 


411/44/32 


7 


100 


30.0 


0 


155/11/2 


8 


96.0 


53.8 


0 


158/12/11 


9 


45.5 


2.6 


0 


300/23/8 




618.0 


144.3 




2268/131/80 


Average 


68.7 


16.0 




:?52/15/9 


liC Tapes 


I 


R 


N 


Items Scanned 


1 


0* 
0 


4.4 


5.0 


496/19/15 


2 


n 


A ft 


0 


433/9/5 


3 


100 


0 


0 


433/2/1 


4 


12.5 


0 


n 
u 


188/12/5 


5 


27.3 


0 


0 


496/9/5 


5 


33.3 


0 


0 


676/27/9 


7 


25.0 


0 


0 


219/18/3 


8 


33.3 


0 


0 


308/16/9 


9 


0 
0 


0 


0 


496/20/3 




231.1 


9.2 




3745/132/55 


Average 


33.0 


1>0 




416/15/6 



Q * Q Figures have been left out of the average, 
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their production so, at least, those libraries 

receiving the BNB Weekly List should already have 
noted relevant items. However the averages (taken 
over five weeks) indicate tnat although libraries 
had already seen mention of 68,7% of the British 
books that they would in fact buy, they had only 
noted 16.0% of those they were interested in 
knowing about. (One library - number 6 - had 
already seen 50% of their non-relevant booksl). 
As suspected fewer of the American books were 
previously knov;n - 33.0% of important books and 
only 15; of interesting books. This latter is part- 
icularly remarkable since the precision value of 
these books is quite high. 

Items scanned 

Vie also considered the number of items that a library 
would have to look at in the book selection process. 
If we assume that a librarian will look through the 
whole of each Dewey Decimal class that he is 
interested in from the BNB Weekly List (this is 
probably not strictly true but is ^ne only possible 
assumption that one can make here) then we can find 
the average number of items he would have to scan 
each week. We can also calculate the average 
number of items that are retrieved for one library 
each week and which of these are relevant. We 
arrive at an average ratio per v/eek for items 
scanned: items retrieved: items retrieved and 
relevant of 252:15.9 for British books and 416:15:6 
for American books. (The averages are taken over six 
weeks . ) 
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4 . 2 Times and costs for on-line system 

Our system has so far mainly been working in tha - * ' ne 
mode and any proposed commercial service would be 
in batch mode. Thus costs for this system, alth 
interesting, should not be taken as an indicatic 
the costs for a service. For estimates of batci* 
processing costs the reade** should refer to section 5. 

4.2.1 Weekly running costs 

For the first two weeks' runs on both BNB and LC 
tapes of the natural language boolean searches, 
the computer usage time was noted (TaLle 6) . 
This time is composed of central processc.:*- unit 
(CPU) time and connect or log-on time. For BNB 
tapes the average value for one library for one week 
was 42 seconds CPU time and 21 minutes connect 
time; for LC tapes it was 35 seconds CPU time and 
18 minutes connect time (these figures may be 
lower than average as the first two LC tapes had 
fewer items than later ones). If we used Cybernet' s 
most expensive rates (we were paying considerably 
less in fact) of £4.80 per minute CPU and £2.70 
per hour connect we would have an average cost of 
about £4 per week for each library for each tape. 
The cost we were paying was in fact about £1.70 
per library per tape. 

However, a new file structure was implemented towards 
the end of the project, and although all the 
processing had been carried out by this time, a 
few tests were run on these new files (see last 
column, Table 6). These figures indicate that the 
CPU time for searching v;ould be halved at least. 
This would give an average cost of £2.60 per week 
for each library, for each tape, run on-line 
(at the highest charge rate) . 

ERLC 
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Table 6. Weekly running costs. 



Library 


Computer time 


Final Week 








New files 




(sees) 


(mins) 






CPU 


BNB Tapes 






(Sees) 


1 


56 


28 


24 


2 


40 


20 


9 


•J 




X / 


11 


4 


15 


6 


9 


5 


46 


20 


14 


6 


82 


42 


31 


7 


23 


11 


12 


8 


19 


14 


A 

4 


9 


54 


30 


28 




376 


188 


X *s ^ 


Average 


li 




16 


LC Tapes 


CPU 


Connect 


CPU 


X 


46 


25 


2fi 
*» \j 




33 


15 


15 


•a 
•9 


30 


15 


13 


4 


16 


7 


11 


5 


38 


15 


19 


6 


62 


30 


35 


7 


24 


14 


16 


8 


18 


12 


13 




50 


25 


•J X 




317 


158 


179 


Average 


35^ 




20 
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The other weekly cost that can be easily 
identified is the cost of setting up the files, 
the work being carried out partly on-line and 
partly in remote job entry mode. This v/orked 
out at about £20 for each tape (again using 
the highest rates) , or £15 at the rates we 
were paying, (The new file structure mentioned 
above would not alter this figure.) 

4.2*2 Times for aearching various fields 

Though work has been done on the performance 

efficiency of searching the various fields available 

not much effort has been devoted to considering the 

relative speeds of searching these fields because 

there are so many variables in the system. For a 

discussion of these variables in relation to the 

Scisearch system run by Cybernet (of which our 

system is an adaption) the reader should consv.lt 

g 

a report by Datta and Robertson • The main 

factors affecting the search times in our system are 



i) 


number of terms, length of terms and logic 


used 




in profile 




ii) 


number of items searched 




iii) 


number of fields searched 




iv) 


number of hits 




V) 


number of fields printed out 




vi) 


usage of disc input or output 




vii) 


time of day that search was conducted 






(the computer system is more heavily used 


at 




some times) 





If we keep all these factors except the number of 
fields searched (iii) and the nuir...er of hits (iv) 
constant we can get an idea of the relative times 
needed to search the different fields. 



31 



From a small sample it appears that there is no 
great difference in search times for the 245, 650/1 
and 690 fields - though possibly the 650/1 has 
slightly shorter times and the 690 slightly longer. 
Searching all these three fields together is only 
one third more time-consuming than searching any one 
field on its own. Weighted searching is slower 
than boolean - a fact probably due to the way 
the programs are written (they were originally 
only set up to do boolean searches) . Both LC 
classification number (050) and RIN (692) searches 
are much faster than searches on the verbal fields, 
but both these types of search give low performance 
figures. 

4.2.3 Times for scanning BNB Weekly List 

Most librarians took about thirty minutes per week 
to scan this list. 

Relations with a bureau 

Most of the time-sharing bureaux in London were contacted 
at the beginning of this project but only Cybernet Time- 
sharing were able to offer any previours experience in the 
field* They adapted their Freesearch programs, written 
in Fortran, at a cost of £500-the main modification made 
it possible to convert the MARC records into a form 
which could be used by the search program. We then used 
the programs on their time-shairing system at a cost of 
£150 per month for 30 hours connect time (including CPU 
time) and paying about £90 per month for disc storage. 
The Freesearch programs are available for purchase from 
Cybernet or for use on a bureau basis on their Sigma 

Our MARCAS system was cheap and quick to set up and ran 
fairly error free. It kept well within our cost and time 
estimates - the response time of the on-line system was 
very rarely more than a few seconds and was usually 
immediate (this depended on the number of people using 
the computer and thus tended to be longer in the afternoons 
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and towards the end of the week) . The good service 
given by Cybernet and their interest in the project 
have amply justified our choice of bureau. 
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Use of batch system 

The running costs of . the system used in this project have 
been discussed in an earlier chapter. 

Obviously batch systems must be considered for an on-going 
service and costed in order to provide a basis for 

planning a pilot service. 

5 . 1 Other batch -systems in the UK 

Several systems offer MARC -based book 

notification In th« UK these are basically 

in-house operations with a few subscribers in related 

institutions. For instance, the Birmingham Cooperative 

Libraries Mechanisation Project gave a service for some 

time to the Bath University Social Science Information 
7 

Officer . Bumpus, Haldane and Maxwell Ltd and 

9 

Richard Abel and Company both offer book notification 
services for their customers, which are in both cases 
free of charge although there is a stated obligation 
to purchase some books from the supplier. 

B, H & M Ltd's brochure says "As you are no doubt 
aware, it costs a great deal of money to provide this 
bibliographic service. We would therefore appreciate 
it if you make positive arrangements to order from 
us at least those publications that have been brought 
to your notice by Maxv;ell's bibliographic service." 
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5. 2 Costs of other systems 

It is worth examining what information is readily 

available concerning the costs of operating a batch 

system using BNB or LC MARC tapes and seeing if 

these can be applied in our case. Unfortunately we 

have not located much data expressed in computer time 

which could be translated to another environment. 

Mauerhoff gives detailed costs but does not give the 

times used nor the rates charged for his computer, 

which is an IBM S360/50 witH 256K bytes of memory. 

Using the SELDOM programs which they developed, the 

cost per profile for running 82 profiles is $116 

annually for searching Library of Congress tapes each week. 

4 

A charge of $0,37 is made for alterations in profiles . 

The Aldermaston AMCOS system selects items from the 
MARC tapes using 58 DC numbers as search keys. Selection 
and printing the 10% of the files which are 'matches' 
(about 200 items per week) costs £18 each week using an 
IBM S360/75, which involves 15 minutes elapsed computer 
time and 1.08 minutes CPU time.' £6 of the £18 was 
for translating the tapes from the ASCII code of the 
MARC tapes to EBCDIC, the internal IBM code -""^ £12 
of the £18 is comparable to one of our library profiles. 

5 . 3 MARCAS batch system costs 

A batch version - or more correctly - a remote entry 
for batch processing of the MARCAS programs became 
available in November, 1972. By rerunning some of 
the profiles on the last week of our test data in 
batch, we arrived at computer costs of £1.30 per 
library for searching Library of Congress and BNB 
tapes. 

Using the new MARCAS programs would entail small 
development costs in programming. One additional 
small program is required to reformat the "hit" 
records. When usage reaches 10 profiles, which 
hopefully it would do very quickly, some programming 
is required which would in fact merge the profiles for 
running, making this a true batch job and presumably 
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reducing the costs further. £200 would be required 
for both these alterations to the system. They sho.ild 
be done after a survey to determine the most desirable 
form of presentation . 



If more than 20 profiles, are to be processed 
a more elegant file structure 

should be investigated. Tell"'"''' has found, using 
an IBM S 360/30 with 64K memory, that at about 2G00 
profile termsr the overhead required by this additional 
file handling is off-set by faster search times, and 
between 5000 and 10,000 terms there is virtually no 
increase in search time, using hashcoding and tree- 
structure searches. Some comparison of available 
systems should be made if the demand is sufficient. 
The Canadian SELDOM programs, the Swedish ABACUS and 
other programs designed for large scale SDT (e.g. 
more than 2000 search terms per run) could fruitfully 
be explored. 

• 5 Provision of standard interest profiles 

An organization which aims at being cost recovery 
would have difficulty in producing a library-oriented 
service which is cheaper than the Oklahoma, Florida 
and Canadian services, noiB of which is marketed in the 
U K and all of which are aimed at individuals or 
standard interests. (See for example, table 7). 
Likewise none of the British in-house services are 
being marketed. 

A combined US - UK service is not available yet but 
undoubtedly the Canadian service and others will offer 
it soon, at least in North America. 

It appears as if the pattern of offering a two-tier 
service - both specialized profiles and standard 
profiles - is economically a very practical mix 
and is the one we should explore in any future work 
we undertake. 



5.4 
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Table 7 Standard Profiles for MARC 

1. Oklahoma Department of Libraries 

South West 
Library science 
Bibliography and Reference 
Political science and law 
Drug abuse 

American Indians ) 

„ . ^ 1 . ) planned. 

Environmental science j 

2. Florida University Library (Nov 1971) 

The MARC II-IS service includes at the present tirre Cur 
Aware Searches (CAS) , Retrospective Searches (PS) , and 
Standard Interest Profiles (SIP) both current and 
retrospect ive . 

A n'lmber of Standard Interest Profiles for the lAhRC II 
search system are already available. New MARC II SIP's 
will be added as demands develop. Currently available 
MARC II sip's include: 



MECHANICAL ENGINEERING 
TAXATION (STATE & LOCAL) 
HIGHER EDUCATION 
MULT I -MEDIA 
LIBRARY SCIENCE 

AMERICAN REVOLUTIONARY WAR PERIOD 
CIVIL WAR PERIOD 
AMERICAN COLONIAL PERIOD 
PRIVATE AVIATION 
MARKETING 

SALESMANSHIP & SALFS MANAGEMENT 
STATE GOVERNMENT 
LOCAL GOVERNMENT 
JAPAN 

BLACK STUDIES 
CRIMINAL JUSTICE 
URBAN PROBLEMS 
RURAL PROBLEMS 
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6. Conclus j.onF^ and reconiTier.da tions for future \:ork 

6 . 1 Implicat i ons of findings c^bout retrieval cliarac1-er ist ics 

6.1.1 Recall and precision 

Based on the results given in section 4, it seems 
that a retrieval system intended to aid book 
selection by special librarians could provide a list 
of recently published books of which about half would 
be of interest to the library for which the profile 
had been designed. As the total number of references 

received would be small, on average, this 

low precision does not seem worrying. Recall would 

be better than 50%. These figures are for natural 

language searching of title and subject headings using 

boolean logic, which produced our best results. 

Class number searching could be used in cases where 

high precision is requested. 




Such a service could not provide the only book 
selection aid for a special library's 'core' 
erea but would provide a very useful supplement 
tc publishers' announcements and standing orders 
for series known to be useful. In less central 
areas of interest, where complete coverage is 
not required, a MARC book selection service 
providing a list of an average of 15 books per week 
could save the acquisition librarian n lot of time 
scanning publishers ' announcements , reviews , etc. 

6.1.2 Currency 

The information collected about the number of 
items retrieved which were already known to the 
librarian has two main implications: of the 

UK iooks 'of importance* to them (probably 'core' 
books) were not known and of the US books were 
not known. For books 'of interest' these figures 
are higher especially for items appearing on 
American tapes (99%) which were, by the nature of 
the service, even older notifications. This would 
indicate that a book notification system which 
includes the Library of Congress tapes would 
, provide a service which libraries are not receiving 



One of our users commented that European tapes 
would be more use than US tapes. We have no reason 
to doubt that we would get good currency results 
on these tapes since libraries get .'"'j^^er 
publishers' announcements from abroad and currently 
rely more on less speedy forms of notification 
(i.e. reviews, advertisements in foreign journals, 
citations). 

The (lower) precision - recall results we achieved 
by searching titles alone are applicable to 
^^iiitaker tapes. If these tapes were available, 
the currency results should be better, since these 
tapes contain notifications of books due to be 
published^ as opposed to those already published 
and deposited for copyright as in the case of 
MARC tapes. Would the loss in recall be offset 
by the improved currency (and probably greater 
cost as Whitakers would probably charge a larger 
royalty fee)? 

6.1.3 Profiling 

Constructing a good natural language profile 
representing the entire interests of a library 
takes longer than constructing one for a subject 
specialist. With experience one should not take 
as long as we did. Also more interaction with 
the subscriber would be possible and desirable 
and should speed up profiling. We did not experiment 
with users constructing their own profiles, although 
they saw the profiles in the first week. If this 
could be done effectively costs would be reduced 
considerably , 

Another area worth considering is automatic profile 
construction using LC Subject Headings and other 
indexing terms. By analysing the terms used to 
index books selected as relevant over a period of 
2 to 3 months, a profile could be constructed. 
Any additional subject headings used on relevant 
books retrieved could be analyzed by the user to 
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see if it were a heading of interest to them. 

Until methods of reducing profiling costs are 
tested, our costings must be based on those found 
in our study. Use of known relevant items seems 
the most useful profiling tool. Lack of subject 
expertise on the part of the profile constructor 
was, not surprisingly, found to be a disadvantage. 

6 . 2 Standard profiles 

The main advantage of standard profiles is that the 
same costs are shared by a number of institutional users. 
Many tape-based retrieval services are marketing standard 
profiles, including at least two LC-MARC services 
(Oklahoma and Florida) . 

A mix of standard and specialized profiles would enable 
both services to be produced at a lower cost since fixed 
costs would be shared over a larger number of users. 

Most tape-based SDI services offer some sort of standard 
profiles at a reduced rate. This is usually an afterthought, 
resulting from the file being a\'ailable, lower demand than 
anticipated for individual profiles and to an avvareness 
that certain topics are amendable to standard profile 
techniques or meet a demand. 

The following factors about standard profiles need to 
be determined for MARC tapes before a service can be 
offered: 

1. Can a good cheap standard profile be defined as a 
topic broad enough to be of wide interest yet specific 
enough to be defined by relatively few terms? Define 
a strategy for locating a good profile. 

2. What is the cost of processing a standard profile? 

3. How many standard profiles would be required to cover 
^ a library's interest {and vice versa how much of a 

ERIC 
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library's interest can be covered by a profile)? 
Could a combination of standard profiles replace 
a specialised profile? 

4. How would the provision of standard profiles affect" 
the cost of a wider book notification service? 

5. Would the user of standard profiles be the same as 
for the specialised profiles? 

6. What standard profiles are offered by other non-MARC 
services? 

6, 3 Survey of book selection methods and needs 

The small sample on which we worked can give us fairly 
reliable operational data on which to base the design 
of a larger system. However we still know very little 
about the demand and need for a book notification service. 
Before a larger experimental system is developed a survey 
should be carried out in an attempt to answer the following 
questions about present book selection practices and 
deficiencies : 

1. How much time is spent in special libraries on 
Look selection by what level of staff? 

2. What book selection aids do they use? Which do 
they pay for? 

3. Are they interested only in British books? - English 
language books? 

4. Do they wish to scan a narrow range of subjects or 
a broad one? 

5. Do they buy from just a few publishers? 

6. Would they prefer 

high recall or high precision? 
expensive individualized profiles or 
cheap standard profiles? 

(including price they would be willing to pay) 

7. Do they think they have a problem? 

8. Do they prefer card (more expensive) or 

listings on A4 paper? 
3. What information other than title and author 
^ should ^e provided: 

ERIC 
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ISBN 

DC number 

LC classification number 
Publisher , Imprint 
Corporate author 
LC Subject headings 
Others? 

6 . 4 Other recommendations for future work 
If there is a need for either 

specialized profiles and/or standard profiles, then some 
development work and assigning of responsibility is 
required. 

6.4.1 Investigation of other systems for improved 
performance and availability 

The flexibility of the MARCAS system leaves 
little to be desired from the user's point of view: 
the method of entering searches is simple and 
straightforward and any field can be searched. 
However no claims can be made about its efficiency. 
Although we have no data for comparable BNB or LC 
MARC tapes, our computer costs may be higher than 
those of systems searching other tape files. For 
this reason, trial runs should be carried out 
using other available software such as the SELDOM, 
ABACUS, and AMCOS programs and perhaps trying 
more generalized packages. This should begin as 
soon as possible but 

can and should extend until all the alternatives 
have been tested. This will not require much man- 
power per test and the MARCAS system can be used 
until a demonstrably better system has been accepted. 

6.4.2 Pilot Service to Fifty users for a year at a subsi- 
dized price of £25. Participants will be sought from 
those responsible for book selection among industrial, 
government and academic organizations with various 
3ized book purchase budgets. A variety of output 

O 

ERsLC formats will be used including cards, A4 listings^ 
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various fields of information printed, etc. 
User reaction to the service will be monitored 
by questionnaire, interview, the right to cancel 
the service or to alter their profile. 

Initially the computer processing will be carried 
out using the MARCAS programs in batch processing 
mode. Cybernet Time--Shar ing Sigma 9 will oe used 
in the first instance. 

Although a service offered to 50 users at £25 a 
year will have an income of £1250, a further £10,000 
will be needed to cover the costs of this part of 
the project. Manpower allocations are a problem 
since the major part of the profile construction 
work obviously must be carried out early in the 
project. 

A charge to the user is felt to be desirable in 
order to insure that the participants have a real 
need for the service and will make real demands on 
it. Also such a charge will effectively eliminate 
individuals using a system designed for what amounts 
to group profiling. 

From offering such a service, we hope to gain further 
understanding of the problems involved in running 
a MARC-based book notification system. Experience 
in profile maintenance, processing and distributing 
will be gained. A pilot service should give us 
a better understanding of the viability of such a 
service and the form it should take. 

6.4.3 Profile development and evaluation 

Responsibility for continuing customer liaison 
should lie with the person developing the profiles. 
Since fringe areas of interest shift more than 
core areas, ample time should be allowed for profile 
modification and alteration. Continuous evaluation 

ERIC 



and customer feedback will only be possible 
if the manpower is made available from the start. 
If at all possible, the people involved in 
constructing profiles during the experimental 
period should continue to use on-line techniques 
for profile construction. There is no doubt that 
this will result in faster accumulation of 
"operational" profiles . 



44 



References 



1. WAINWRIGHT, J. BNB MARC users in the UK; a survey , Prograni , 
6 (4) • 1972, pp.271-283. 

2. STUDER, W.J. Computer-based selective dissemination 

of information (SDI) service for faculty using Library 
of Congress MAchine-Readable Catalog (MARC) records. 
Indiana University. 1968, Ph.D.thesis* 

3. BIERKAN, K.J. and BLUE, B.J. A mRC-based SDI service. 
Journal of Library Automation ; 3(4). December 1970» 

pp. 304-319. 

4. MAUERHOFF, G.R. A MARC Il-based program for retrieval 
and dissemination. Journal of Library Automation , 4(3). 
September 1971^ pp. 141-158. 

5. PATRINOSTRO, F.S. A survey of automated activities 
in the libraries of the United States. Vol.l.LARC 
Assoc . 1971 . 

6. ATHERTON, P. and MILLER, K.B. LC/MARC on MOLDS; an 
experiment in computer based, interactive bibliographic 
storage, search, retrieval, and processing. Journal 
of Library Automatio n, 3. June 1970^ ppA42-165. 

7. LINE, M.B., CUNNINGHAM, D. and EVANS, S. Experimental 
information service in the social sciences 1969-1971. 
Final report. Bath University Library. January 1972, 

8. DATTA, S. and ROBERTSON, S.E. Analysis of on-line 
searching costs: an experiment using a commercially 
available reference retrieval system (SCISEARCH) . 

Aslib Research and Development Department. February 1972. 

9* CHVATAL, O.P. and OLSON, G.L. A computer-based acquisition 
system for libraries. In: Proceedings of the ASIS 34th 
Annual Meeting, Denver, November 1971 . Volume 8 • Greenwood 
Publishing Company. 1971. pp. 217-226. 

10. CORBETT, L. and GERMAN, J. AMCOS project stage 2: a 
computer aided integrated system using BNB MARC literature 
tapes. Program , 6 (1) ^ 1972. pp. 1-35. 

11. TELL, B.V., LARSSON, R. and LINDH,R. Information retrieval 
with the ABACUS program. IAEA Symposium on Handling of 
Nuclear Information, Vienna. 1970. pp. 183-199 . 

12. KING, M. , HOLLOWS, A. and BAKER, R.F. Index to standard 
profiles available in June 1972. Birmingham University 
Main Library, Science Information Office. July 1972 . 

13. LEWIS, P.R. Systematic library use of British National 
Bibliography services and data. A survey of British 
practice. Journal of Llbrarianship , 2 (4) 1970. pp,211-226. 



ERLC 



Appendix II 

Potentjal for use with other MARC format tapes 



There are certain limitations connected with on-line 
information retrieval systems: they are expensive; 
it is not possible, usually^ to have large data bases 
available. Thus their main use would appear to be for 
profile construction and also for demonstrations. However the 
KARCAS program v;orks v/ell and is available for use on any 
MARC II format tape v/ith only slight modifications to the 
initial program (i.e. removal of the juvenile indicator 
and English language tests). Enquiries have already 
been made concerning their use from the Road Research 
Laboratory and the National Council of Educational 
Technology. 

The following tapes are available or will shortly be made 
available, in MARC II or compatible format (B. S. 4748 . 1971) 

Bibliography of Agriculture 
CAB 

COMPENDEX 

Current Index to Conference Papers in ^^hemistry, Engineering, 
Life Sciences 

ERICTAPES 

INSPEC 

PANDEX 

PAIS - Psychological Abstracts 

TAB Indexes 

UNESCO 



